# End-to-end training

Ade20k Panoptic Eomt Large 640
MIT
This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.
Image Segmentation PyTorch
A
tue-mps
105
0
Ade20k Panoptic Eomt Giant 640
MIT
This model reveals the potential of Vision Transformer (ViT) in image segmentation tasks by adapting its architecture specifically for segmentation.
Image Segmentation
A
tue-mps
116
0
Migician
Apache-2.0
The Magician is the first multi-modal large language model with free-form multi-image localization capabilities, achieving precise localization in complex multi-image scenarios and outperforming models with a scale of 70B in performance.
Text-to-Image Transformers English
M
Michael4933
83
1
Creek
Apache-2.0
A large language model built from scratch, with fully open-source implementations including tokenizer training, model initialization, pre-training, and instruction fine-tuning
Large Language Model Transformers
C
maheer
21
1
Detr Resnet 50 Sku110k
Apache-2.0
This DETR model has been trained end-to-end on the SKU110K object detection dataset with the number of queries set to 400, suitable for scenarios like product shelf detection.
Object Detection Transformers
D
isalia99
4,066
2
Segformer B0 Finetuned V0
Other
An image segmentation model fine-tuned on the tontokoton/artery-ultrasound-siit dataset based on nvidia/mit-b0
Image Segmentation Transformers
S
Pavarissy
15
0
Encodec 24khz
EnCodec is a high-fidelity real-time neural audio codec developed by Meta AI, employing end-to-end training and supporting multiple bandwidth settings.
Audio Generation Transformers
E
facebook
534.08k
46
Deformable Detr Detic
Apache-2.0
Object detection model trained on the LVIS dataset containing 1,203 categories using deformable detection transformer architecture
Object Detection Transformers
D
facebook
792
8
Imclasif Genres V001
This is an image classification model generated by HuggingPics, primarily used for classifying images of specific types (genres).
Image Classification Transformers
I
sanali209
21
0
Gender Classification
An image classification model generated by HuggingPics for identifying gender (male or female) in images.
Image Classification Transformers
G
Enverrr
13
0
Yolos Small Balloon
YOLOS is an object detection model using Vision Transformer (ViT) architecture, trained with DETR loss and fine-tuned on COCO and Matterport Balloon datasets.
Object Detection Transformers
Y
zoheb
101
1
Wav2vec2 2 Bart Large No Adapter
This model is an automatic speech recognition (ASR) model trained on the LibriSpeech ASR dataset, capable of converting English speech into text.
Speech Recognition Transformers
W
sanchit-gandhi
22
0
Wav2vec2 2 Rnd
An automatic speech recognition model trained on the LibriSpeech ASR dataset, designed to convert English speech into text.
Speech Recognition Transformers
W
sanchit-gandhi
16
0
Wav2vec2 Tiny Random Robust
Apache-2.0
A lightweight automatic speech recognition (ASR) model, based on a randomly initialized version of the Wav2Vec2 architecture, designed for robustness testing.
Speech Recognition Transformers English
W
patrickvonplaten
406
0
Kan Bayashi Ljspeech Fastspeech2
This is a FastSpeech2 text-to-speech (TTS) model trained using the ESPnet framework, utilizing the LJSpeech dataset.
Speech Synthesis English
K
espnet
22
0
Wav2vec2 2 Bert Large No Adapter Frozen Enc
This model is a speech recognition model trained on the librispeech_asr dataset, achieving a word error rate (WER) of 2.0133 on the evaluation set.
Speech Recognition Transformers
W
speech-seq2seq
25
2
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase